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Real-Time, Soft Robotic Patient Positioning System for Maskless 
Head-and-Neck Cancer Radiotherapy* 
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Abstract —We present an initial examination of a novel 
approach toward accurately positioning a patient during head 
and neck intensity modulated radiotherapy (IMRT). Position- 
based visual-servoing of a radio-transparent soft robot is used to 
control the flexion/extension cranial motion of a manikin head. 
A Kinect RGB-D camera is used to measure head position and 
the error between the sensed and desired position is used to 
control a pneumatic system which regulates pressure within 
an inflatable air bladder (lAB). Results show that the system 
is capable of controlling head motion to within 2mm with 
respect to a reference trajectory. This establishes proof-of- 
concept that using multiple lABs and actuators can improve 
cancer treatment. 

Index Terms - Life Sciences and Health Care; Mechatronics; 
Emerging Topics in Automation 

1. Introduction 

This paper presents a systematic initial examination of 
an image-guided soft robot patient positioning system for 
use in head and neck (H&N) cancer radiotherapy (RT). 
H&N cancers are among the most fatal of major cancers. 
In 2014, 1,665,540 new patients developed pharynx and oral 
cavity cancers which led to 585,720 deaths in the United 
States [1]. Treating these cancers often involve intensity 
modulated radiotherapy (IMRT) where a patient lies on a 6- 
DOF movable treatment couch and lasers or image-guiding 
systems are used to ensure the patient is in the proper 
position. A linear accelerator (LINAC) is used to accelerate 
electrons in a wave guide to enable collision of electrons with 
a heavy metal target. High-energy x-rays produced from the 
collisions are shaped by multileaf collimators as they exit 
the gantry of the machine to conform to the shape of the 
patient’s tumor. The beam that emerges can be directed to a 
tumor from any angle by rotating the gantry and moving the 
couch. 

IMRT requires accurate patient positioning while high 
potent dose radiation is delivered to tumor while sparing 
critical organs nearby. An examination of dosimetric ef¬ 
fects on patient displacement and collimator and gantry 
angle misalignment during IMRT showed high sensitivity 
to small perturbations: a 3-mm error in anterior-posterior 
direction caused 38% decrease in minimum target dose 
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or 41% increase in the maximum spinal cord dose [2]. 
Treatment discomfort and severe pain often results from long 
hours of minimally invasive surgery where the skull is fixed 
with pins for head immobilization during stereo-tactic radio¬ 
surgery (SRS). In addition, conventional linear accelerators 
(LINACs) used at most cancer centers are insufficient for the 
high geometric accuracy and precision required of SRS for 
isocenter localization [3]. 

Image-guided radiotherapy (IGRT) has made progress in 
improving accuracy while reducing set-up times [3], [4], [5]. 
The Robotic Tilt Module (RTM) interfaced with an image- 
guidance system [3], [6] enables high-precision positional 
correction by automatically aligning the patient when the 
image-guidance system detects positional errors. However, 
the power of IGRT hasn’t been fully explored due to the 
limited degrees of freedom of couch motion. State-of-the-art 
couches can only correct rigid errors, but not compensate 
for curvature changes, which often occurs in neck position¬ 
ing. Also, patient motions are often ignored during image- 
guidance procedures, where the focus is on the use of images 
only before treatment. 

The overall goal of this work is to address the non- 
rigid motion compensation during H&N RT. For an initial 
investigation, we control the one degree of freedom, raising 
or lowering of a generic patient’s head, lying in a supine 
position, to a desired height above a table. The system 
consists of a single inflatable air bladder (lAB), a mannequin 
head and a neck/torso motion simulator, Kinect Xbox 360 
Sensor, two pneumatic valve actuators controlled by custom- 
built current regulators, and a National Instruments my RIO 
microcontroller. The Kinect Sensor is mounted directly above 
the head for displacement measurement. The error between 
the measured and desired head position, as sensed by the 
camera, is used in a PI controller nested within a PID 
feedforward to control the pneumatic actuator valves, thereby 
regulating air pressure within the lAB and moving the 
patient’s head. 

Soft robot systems are deformable polymer enclosures 
with fluid-filled chambers that enable manipulation and loco¬ 
motion tasks by a proportional control of the amount of fluid 
in the chamber [11], [12]. Their customizable, deformable 
nature and compliance make them suitable to biomedical 
applications as opposed to rigid and stiff mechanical robot 
components - impractical in enabling articulation of human 
body parts. Our final design is a deformable lAB and a soft- 
robotic actuator specifically to address the problem defiection 
or attenuation of radiation beams. 

The paper is structured as follows: Section [n| gives an 



Fig. 1: Experimental Testbed 


overview of the system design and hardware set-up; Section 
[^details the vision algorithm used to determine the position 
of the patient’s head; Section [Tvl describes the identification 
of the soft robot model, and the control system design is 
presented in Section [V| Experimental results are presented 
in Section [Vl| and we discuss future work and conclude the 
paper in Section |VII[ 

11. System Design 

The system set-up is shown in Fig.[2 The patient simulator 
is a Lexi cosmetology mannequin head (11” high, 6” wide) 
with a hollow base that allows for a placeholder clamp. To 
simulate torso-induced neck motion, we attach a ball joint 
in the hollow base of the head. The soft robot actuation 
mechanism combines a inflatable air bladder (19” x 12”) 
made of lightweight, durable and deformable polyester and 
PVC, two current-controlled proportional solenoid valves 
(Model PVQ33-5G-23-01N, SMC Co., Tokyo, Japan), and a 
pair of silicone rubber tubes (attached to a T-port connector 
at the orifice of the lAB) in order to convey air in/out of the 
lAB. A IHP air compressor supplied regulated air at 30 psi to 
the inlet actuating valve, while an interconnection of a 60W 
micro-diaphragm pump and a PVQ valve removed air from 
the outlet terminal of the lAB. The diaphragm pump creates 
the minimum operational differential pressure required by 
the outlet valve. 

We mount a Microsoft Kinect RCBD camera at approx¬ 
imately 710mm above the manikin head, with the lAB 
fully deflated. A medical pillow was surrounds the head to 
reduce infra-red wavelengths scattering caused by the hair 
on the mannequin head, improve image processing for face 
extraction and negate undesirable head rotations. The vision 
algorithm was implemented on a 32CB RAM DELL Preci¬ 
sion Laptop that ran 64-bit Windows 7.1 on an Intel Core 
i7-4800MQ processor. The real-time control processing was 
implemented on a National Instruments my RIO embedded 
system running Lab VIEW 2014. 

III. Vision-Based Head Position Estimation 

The Kinect camera, though insufficient for clinical use, 
is reasonable for development and laboratory testing. H&N 
Radiotherapy verification and validation experiments will 



Fig. 2: Depth Constrained 3D Face Tracking of a human head (left) 
and a mannequin head (right) using AAM. 

incorporate the high-precision VisionRT 3D surf ac^ imaging 
system, approved for clinical use and capable of capturing 
a patients position with the sub-millimeter spatial and sub¬ 
degree rotational accuracy. We use the near mode depth range 
of the Kinect sensor, i.e. 400mm - 3000mm [13], and the 
640 X 480 depth image resolution and stream images at 30 
frames per second. We adopted the Microsoft Kinect SDK 
version 1.5.2 and OpenNI .NET framework [14], [15] for 
rapid prototyping of the experimental testbed. 

An active appearance model (AAM) [16] was employed 
for face tracking, as it is a fast and robust method that 
uses statistical models of shape and gray-level appearance of 
faces. We adopted Smolyanski et al’s approach [17], which 
uses depth data to constrain a 2D 3D AAM fitting. The 
approach in [17] was extended to a non-human object, i.e. the 
mannequin head in Fig.[^ by initializing the face tracker with 
a qualitatively determined region of interest. The face tracker 
utilizes both depth and color data but computes 3D tracking 
results in the video camera space. The video camera space 
is a right-handed system with the Z-axis pointing towards 
the face being tracked and the Y-axis pointing in the vertical 
direction. 

The points corresponding to the tip of the nose is fairly 
invariant to movement of facial muscles. Therefore, the Z- 
coordinates of points corresponding to the nose area were 
averaged, and this was used to determine the patient position 
with respect to the origin of the camera frame. We mapped 
this result to world space, i.e. the heads displacement above 
the table using the relation 

y{t) =ym-yh (i) 

where y{t) is the displacement of the head from the table; 
ym is the head displacement as measured by the camera ; 
is the mounting height of the camera above the table. 

The tracked head position value from O was transferred 
from the vision processing workstation to my RIO over a 
local wireless network using the user datagram protocol 
(UDP). We chose UDP over other handshaking, dialog-based 
connection transmission models because the application is a 
real-time sensitive one. The typical problem of dropped pack¬ 
ets with UDP-based connections, is preferable for our goals 
over delayed packets, which can occur in other connection- 
based protocols. An algorithm for Network interface level 

^ Vision RT - AlignRT Real-Time Patient Tracking, Patient Set-up in Radiation Therapy 





























Fig. 3: Vision Flowchart Using the OpenNI .NET Assembly 


error-checking and correction was handled in myRIO using 
the procedure described in Fig. 

A deterministic protocol was implemented on myRIO to 
prioritize transmission of Kinect measurement data and to 
eliminate synchronization errors between the server and the 
client - a common issue with the Windows operating system. 
To ensure the deterministic task does not monopolize other 
myRIO processor resources, a timing engine was employed 
with an execution rate equal to that of the depth image 
processing loop on the Windows workstation, i.e. 30Hz. Fi¬ 
nally, a 20th order nonrecursive point-by-point finite-impulse 
response filter was employed to mitigate measurement noise 
from streamed data. 

IV. System Identification and Modeling 

A reliable system model is necessary to design a stable 
controller with required time and precision characteristics. 

A. Data Collection 

With the regulated air canister providing a constant pres¬ 
sure of 30 psi, a periodic, persistently exciting input current 
in the form of a sawtooth waveform was used to excite the 
inlet PVQ valve such that the experiment was open-loop 
informative [18, p. 414]. Airflow out of the outlet valve 
was kept constant by opening it to the mid-position of its 
operating range. This varied the head position through an 
open-loop infiation/defiation process of the lAB. 

The current to the inlet valve, u{t), was band-limited such 
that it had no power above lOHz, .i.e., the Nyquist frequency 
of the valves, and its spectrum coincided with the spectrum of 
the discrete time signal. The output signal, y{t), is the height 
of the head given by Q- We acquired 8,800 samples of the 
input and output signals for data modeling, and a second set 
of 8,800 samples was collected for model validation. 


B. Data Pre-Processing and System Model Identification 

Consider a single input, single output relationship in the 
form of a linear difference equation 

y{t) + aiy{t - 1) H-h a„2/(i - n) = 

hiu{t — 1) -h • • • + bmu{t — m) 

Rewriting ^ such that it models a one-step-ahead predictor, 
we have 

y{t) = -aiy{t - 1)-- n) H- 

+ hiu{t — 1) + bmu{t — m). 

We want a model structure from the collected data set, 
^(1), • • •, u{N)^y{N)}, parametrized by map¬ 
ping from the set of all past inputs and outputs, to the 

space of the model outputs. Denote the model as y{t\0) 

m0)=9{0,Z^-^) (4) 

where 0 is the set of estimated coefficients to satisfy © 

0 = [^(2i • • • CLfi bi • • • bfYi ] • (5) 

The identification goal is to identify the best model in the set, 
Z^, guided by frequency distribution analysis. Removing 
means and linear trends in collected data will minimize the 
effects of disturbances that are above the frequencies of 
interest to system dynamics, and will eliminate occasional 
outliers and non-continuous records in collected data [18, 
Ch. 3, pp. 414]. Therefore, acquired data was normalized 
using 


Uave{t) = U{t) - U, VaveH) = y{t) - y (6) 

I N I N 

where u = u(t) and y = ^ y{t) are the corre- 

A t=l A 

spending sample means, n is the discrete time index and N 
is the total data length [18, Ch. 1]. Linear trends were then 
removed using 


— '^ave(0 ^^u-) yd{t) — yaveif) C^) 

where Ou and Oy are the solutions to the least-square fit 
equations 


{A^A)0u = A^u, {A^A)0y = A^y ( 8 ) 


and 


A^ = 


1 1 

J_ A 

N N 


1 1 

N -1 1 

N 


( 9 ) 


To examine, the relationship between the input and out¬ 
put signals, the normalized cross-correlation function (CCF) 
between u{t) and y{t), was determined as 




E [u{t - t) - u] [y{t) - y] 


tHt)-u]\t[y{t)-yf 


t=l 


t=l 


t = 0,±1,---,±(7V-1). 


( 10 ) 
























Auto-Correlation Function of Residuals For Pre-Whitened Input Data 
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Fig. 4: CCF of Input and output Signals 


operating input 


u(t) 


h(t) 


, measured output 


y(t) 


Fig. 5: Impulse Response Correspondence with the CCF 


Since the CCF is the convolution of the system impulse 
response, h{t) and the process auto-correlation function, 
the Wiener-Hopf equation can be rewritten as 


= J h{u)K[u{t)u{tr — iy)]di' 

= J h{v)xl)uu{T - u)dv 


( 11 ) 


F{ 




= 1 + + <;2Z-^ 


■ 


-20 


( 12 ) 


The parameters q, (i = 1,2, •••,20) were estimated by 
fitting an autoregressive model to u{t) and were generated 
with the ‘ar’ command in MATLAB. 

The estimation result after fitting the white noise auto¬ 
regressive model was found to have 76.75% fit to estimation 
data, with a mean squared error (MSE) of 78.11 mm^. The 
normalized auto-correlation function tells of the filter quality 
F{z~^) and is given by 


N 


[u{t) - u] [u{t + r) - m] 

^uu{t) = — -- 

EHt)-uf 

t=l 

where r = 0, ±1, • • •, it(A^ — 1). 


(13) 



Bode Frequency Response of Detrended Input and Output D 
From: u1 To: y1 



where E denotes the expectation operator. Equation O 
implies the CCE between the output and test input is pro¬ 
portional to the system impulse response when the input 
is a white noise signal[18, p. 13]. The CCE in Eig. 
is not a correct estimate of the system impulse response, 
since the excitation input was not a white noise sequence. 
Therefore, the input and output were prewhitened with a 
white noise input sequence, Upw(t) = u(t)F{z~^), where 
Upw{t) is a zero-mean white input sequence and F{z~^) is 
an autoregressive filter of order 20 defined as 


Fig. 7: Power Spectral Density Phase Plot of Detrended Input and 
Output Signals 


The auto-correlation function of the residuals of the pre¬ 
whitened input signals, seen in Fig. are within 95% 
confidence bands (the dashed red lines). Hence, we conclude 
that we correctly estimated the filter. To find an optimal 
sub-model for the identified system that will tolerate non- 
linearities and handle disturbances well, our final choice 
was a linear, second-order grey-box process model on the 
detrended data with quality measurable by the MSE. This 
model choice is informed by the previous impulse response 
analysis which suggests a delay in the system (Fig. and 
gave an affordable model cost acceptable for solving 6>Ar. A 
high-order complex model may be marginally better but may 
not be worth the higher cost [18, §16.8]. 

By analyzing the bode response of the spectral frequency 
density distribution of the detrended data, we chose the 
approximately linear frequency range (0.00232 rad/sec - 6.85 
rad/sec) in the frequency distribution of Fig. to represent 
the desired model. 

C. Model Estimation 

Using term selection and parameter estimation, we fit a 
second-order process model with transfer function form 

1 + sTz 


G(s) = Kp 


(1 + sTpJ(l -h sTp^) 




(14) 


where T^, Tp^, and Tp^ are respectively the process zero and 
the time constants contributed by the first and second pole of 
the system; Kp is the process dc gain, and Td is the process 




















































Frequency response from the input to the residuals 


dead time. The delay is a result of the non-collocation of the 
sensor and actuator. The identified parameters of ( p^ are 
listed in Table U 


TABLE I: Parameter Estimation Results for Soft Robot System 


Kp 

T, 

Tpr 


FPE 

MSE 

Td 

1.0015 

- 0.58354 

100 

9.7257 

1.672 

0.05498 

2 


A first-order measurement noise component ARMA dis¬ 
turbance model, has been fit into G{s) 

y{s) = G{s)u{s) + ^^e(s), (15) 

D{s) 



Eig. 8: Erequency analysis from past inputs to residuals 


where e{s) is a white noise, C(s) = s + 899.3, and D{s) = 
5 + 7.789 . A prediction focus was used to weigh the relative 
importance of how closely to fit the data in the various 
frequency ranges. This favored the fit over a short time 
interval [22, Ch. 3, Sec. 3-38]. The model has 87.35% fit 
to original data with improved quality as the final prediction 
error (FPE) and MSE shows. 

D. Residual Analysis 

To verify the model accuracy with respect to our control 
goal, we employed canonical analysis by computing the 
prediction errors as a frequency response from the inputs 
to the residuals not picked up by the model. Defining the 
outputs predicted by the model as y(t\{0)j^, the errors from 
the modeling process are the residuals 

a{t) = a{t,0N) =y{t)-y{t\{hN- ( 16 ) 

A basic statistics for the residuals from the model such as 

1 ^ 

51 = maxt|Q;(f)|, = (17) 

^ t=l 


V. Control Design 


The step response of the open loop system (Eig. 
shows the system is stable, but with a very slow transient 
response. We require a controller that gives closed loop 
stability and achieves a clinically acceptable response time 
(15 - 30 seconds) while balancing the trade-off between 
aggressiveness and robustness. To do this, a pole was added 
at the origin and a zero was kept close to the introduced 
pole as in Eig. 12 using the following Pl-controller in a 
feedforward configuration with obtained model. 

0.0344 


Gc = 3.79 


(19) 


This reduced steady state error while maintaining transient 
characteristics. The closed-loop unit step response with the 
added controller is shown in Eigurep^ The system’s transfer 
function with the added PI controller of is 

_ -0.00228 (s + 0.009073) (s - 1.7137) exp-^^ 

“ s (s + 0.01) (s + 0.1028) 

( 20 ) 


where the delay was approximated with a second-order Fade 
approximant of the form 


will inform us about the model’s quality since the upper limit 
of Si or the average error of S 2 for all data we have will 
also be bound for all future data. In order to check that the 
model would work for a range of possible inputs, we study 
the covariance between residuals and past inputs 

1 ^ 

- t) (18) 

^ t = l 

and deem the model is invariant to other inputs if the 
numbers, small enough so that y{t) could not 

have been better predicted, i.e., there is no part of y{t) not 
picked up y the model G{s) We compare the estimates of 
the obtained linear model with the corresponding standard 
deviation (from the validation data set, Z^) in Bode plots 
with estimated variance translated to confidence intervals. 
We see from Eig. ^ that the model’s frequency response 
generally stays within the 99% confidence bands (the pink 
and purple lines), and conclude we have a reliable model. 


His) 


— 35 + 3 

52+35 + 3' 


( 21 ) 


This preserved the transient characteristics by sufficiently 
approximating the delay according to our control goal. The 
overall desired transient and frequency response was then 
realized with a feedforward PID-controller in series with the 
closed loop network of the Pl-controlled soft robot system 
The PID controller, given by 

Grid = 3.4993 + + 55.8988s, (22) 

5 


corrected fluctuations in air flow into the lAB and improved 
the system’s dynamic performance such that the overall 
closed loop network has the step response seen in Fig. [TT] 
This produced a non-minimum phase system with settling 
time of approximately 14 seconds. As seen in Fig. [m the 
system converges to steady state with a rise time of 6.29 
seconds. The overall PID-PI control network (shown in Fig 


12) is closed loop stable as the Bode plot of Fig. 13 shows. 







































Plant Open-Loop Step Response 
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Fig. 9: Open-Loop Step Response of Identified Model 


Plant-PI Controller Closed-Loop Step Response 



Fig. 10: Closed-Loop Step Response of PI tuned Soft Robot System 


VI. Experimental Validation 

The testbed of Fig. was used to validate the proposed 
model and control network. The control algorithm was 
implemented using a Runge-Kutta second-order ordinary 
differential equation solver that ran on the myRIO at a fixed 
step size of 0.1 second. A fixed step solver was used to 
avoid reduction in computational efficiency in having to 
discretize the controller and soft robot model, which were 
both modeled in the continuous-time domain. To ensure the 
deployment was executed in real-time, the timing source of 
the execution loop on the Windows workstation was also 
synchronized to the myRIO hardware. 

Experimental results with constant reference tracking is 
shown in Fig. HI- With a constant set-point target of 
25.32cm above the table, and the manikin head being 
24.51cm above the table at rest position, the algorithm was 
deployed to track the set-point trajectory. The controller 
behaves as expected, reaching within 2% of the reference 
after a rise time of approximately 15 seconds and tracking the 
setpoint trajectory to within 0.2cm maximum deviation. The 
system also displays less overshoot and clinically acceptable 
settling time. A second experiment (Fig. [^with changing 
set-points was carried out. The controller tracks the set-point 
trajectories with a maximum deviation of 2mm from setpoint. 

The depth and range resolution of the Kinect Xbox sensor 
accounts for the deviation from setpoint trajectory when the 



Fig. 11: Closed Loop Step Response Plot of PID and PI Cascade 
Network 



Fig. 12: Block Diagram of Model 


controller is applied. The chosen set-points of figures ([T§, 
and are extensible for use in target clinical applications 
as a typical H&N RT may demand. Future multi-axis head 
positioning work will explore the new time-of-flight based 
Kinect for Windows v2 sensor which has an improved noise 
floor, visualizes small objects in greater detail and more 
clearly, and a depth fldelity of 512 x 424 pixels and a wider 
held of view (fov) of 70.6 x 60 degrees compared with the 
320 X 240 pixels with 58.5 x 46.6 degrees fov of the Xbox 
sensor used in this work. 

VII. CONCLUSIONS 

Accurate positioning of the patient head and torso is 
crucial in intensity modulated radiotherapy. Deviations from 
desired positions have been known to cause dose variation, 
degenerate treatment efficacy, brain necrosis and edema[3]. 
In this paper, the control of cranial flexion/extension motion 
of a patient during maskless and frameless, image-guided 


Bode Plot of Control Network 



Fig. 13: Bode Response Plot of feedforward and Cascade Control 
Network 








































































Expt I: Set-Point and Head Position Response 
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Fig. 14: Manikin head response to a constant setpoint 
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Fig. 15: Varying Set-points and Manikin Head Trajectory Tracking 


radiotherapy was considered using a manikin head as a test 
subject. We established that the proposed soft robot can track 
a desired step reference trajectory with 2mm precision after 
a lag time of 15 seconds. This was achieved using a PI 
controller nested within a PID feedforward configuration and 
implemented on an NI myRIO. The Kinect Xbox 360 sensor 
sensed head position. 

This shows the possibility of accurate positioning with the 
choice of a deliberate, well-tuned controller. Future efforts 
will focus on designing a more accurate and robust controller 
usable for clinical RT and improve the transient characteris¬ 
tics. We will also look into gain scheduling to allow different 
settling times for different motion, as fast motions may be 
uncomfortable for patients. Long term efforts include extend¬ 
ing the results to the deformable motions of the upper torso, 
and H&N. This would involve multiple bladders, finding the 
coupling needed between lABs to give desired actuation, 
refining the system model for the bladders, and developing 
a more accurate and robust controller, in order to achieve 
multi-axis positioning irrespective of patient head shape or 
size. This would demonstrate comprehensive and accurate 
automated control of a patient’s position during cancer H&N 
radiotherapy, prevent unwanted anatomical deformations and 
other harmful effects that positioning deviations have been 
known to cause. 
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